Advanced Geospatial Data Processing for Social Scientists
Dennis Abel & Stefan Jünger
2025-04-28
Day
Time
Title
April 28
10:00-11:15
Introduction
April 28
11:15-11:30
Coffee Break
April 28
11:30-13:00
Raster data in R
April 28
13:00-14:00
Lunch Break
April 28
14:00-15:15
Raster data processing
April 28
15:15-15:30
Coffee Break
April 28
15:30-17:00
Graphical display of raster data in maps
April 29
10:00-11:15
Datacube processing I
April 29
11:15-11:30
Coffee Break
April 29
11:30-13:00
Datacube processing II & API access
April 29
13:00-14:00
Lunch Break
April 29
14:00-15:15
Data integration and linking (with survey data)
April 29
15:15-15:30
Coffee Break
April 29
15:30-17:00
Outlook and open session with own application
Our plan
Thus far, we have learned about
Different data formats
How to load them
First steps in interacting with them
In this session, you will learn
How to wrangle raster data even further
Linking to vector data
Manipulating the raster values
How to converse from one format into the other
Subsetting
Cropping raster data
Cropping is a method of cutting out a specific ‘slice’ of a raster layer based on an input dataset or geospatial extent, such as a bounding box. We often do this to ‘zoom’ in on a dataset or to make our computations more efficient. Let’s pretend, we are mainly interested in the small-scale population size of Eastern Germany. For this purpose, we can use the bounding box of Eastern Germany.
Bounding box in R
The easiest way (in my opinion) to create a bounding box in R is to use the sf::st_bbox() function, possibly based on another geospatial dataset (and not ChatGPT).
If we only want to add one attribute from a vector dataset Y to another vector dataset X, we can conduct a spatial join using sf::st_join() as shown earlier. In the raster data world, these operations are called raster extractions.
Raster data are helpful when we aim to
Apply calculations that are the same for all geometries in the dataset
Extract information from the raster fast and efficient
Pulling in some data
For this effort, we re-import the synthetic survey geocoordinates. We sample 100 geocoordinates from the whole dataset, as we only need a few for demonstrating the procedures that are following.
There’s a multitude of arguments that we can adjust to conduct the extraction. An important option is on from which raster cells we want our extraction to be done. Here’s an example how the argument terra::extract(..., touches = FALSE/TRUE) works.
touches = FALSE vs. touches = TRUE
Let’s see how the values differ when we apply the option or not.
Often we default on the mean of raster cell values to be extracted. As we deal with population counts in our example, maybe calculating the sum is more relevant. Or the maximum?
We can use the same procedure to aggregate a raster dataset into a vector polygon dataset. That’s a widespread use case. Let’s load our German districts vector dataset in ./data/. And this time, we use raster data on gender ratios from 2020.
Again, we simply use terra::extract() to create the aggregated data, which we then can add to our vector dataset.
german_districts$gender_ratios_2020 <- terra::extract( gender_ratios_2020, german_districts, fun = mean, na.rm =TRUE, ID =FALSE ) |>unlist()plot(german_districts["gender_ratios_2020"])
Manipulating the raster data
Digging deeper
The previous steps are more or less efforts than work on the raw raster cell values data. However, there are occasions where we might want to work on the raster data themselves or at least pre-process them to add them to another dataset (e.g., a vector file). Some of these procedures are for visualization, such as heatmaps, others are necessary for our later analyses. We will show a few in the following.
Creating a quick ‘heatmap’
Population counts for the whole of Germany helps us to identify urban and rural clusters. But there may be applications where we need to zoom in on a specific city to identify within-city variations, also regarding foreinger shares. Let’s do that for the German capital Berlin. For this purpose, we subset the district data and also mask and crop data on foreigner shares from the German census.
Although, we already can identify high density areas using the raw data, some smoothing may be helpful. We can use the terra::focal() function to do that. It applies a moving window filter on all raster cells of a grid. We’ll have a more detailed look at this function in a minute.
foreigners_berlin_smooth <- terra::focal( foreigners_berlin, w =matrix(1, 5, 5), fun = mean, na.rm =TRUE )plot(foreigners_berlin_smooth)
Real point pattern analysis
Usually, when we talk about heat maps we mean analyzing point patterns and whether they spatially cluster. terra might not actually be your best choice when it comes to more elaborated techniques to estimate density kernels and distance base bandwidths between points to draw clusters. For this purpose, packages such as spatstat are well more suited but they require learning about other data structures.[^https://r-spatial.org/book/11-PointPattern.html] That said, densities are more advanced ways of counting things in raster grid cells, and we can mimic this behavior also with terra. So let’s stick to this package and crop our points to the extent of Berlin.
Next, for our density estimation we simply want to count points with the attribute foreigner = 1 in raster grid cells. However, our point are unfortunately a bit sparse compared to our comprehensive raster dataset. We cannot rely on 1 km² grid cells initially as with the foreigner grid data. But 5 km² may be a good compromise. Let’s do that!
You may wonder why we are doing that. The answer is simple: We want to count the number of points in each of the grid cells. We use the function terra::rasterize() as a simple technique.
Now, you may think this is disappointing. These data looks… not good. But fear not, working with raster data is powerful, as we now use a function you already now for projecting one CRS into another: terra::project(). This function can also be used to aggregate and disaggregate data based on the structure of another dataset. So, while we were not able to use 1 km² grid cells for our density ‘estimation’ initially, we can reproject our 5 km² onto a 1 km² grid like our foreigners grid. A bit of masking also helps in getting rid of cells that are not within the border of Berlin
We can also apply smoothing as with the population grid data.
terra::focal( foreigners_berlin, w =matrix(1, 5, 5), fun = mean, na.rm =TRUE) |>plot()
focal( points_berlin_density, w =matrix(1, 5, 5), fun = mean,na.rm =TRUE) |>plot()
Playing with the smoothing
terra::focal( foreigners_berlin, w =matrix(1, 3, 3), fun = mean, na.rm =TRUE) |>plot()
focal( points_berlin_density, w =matrix(1, 3, 3), fun = mean,na.rm =TRUE) |>plot()
Playing with the smoothing
terra::focal( foreigners_berlin, w =matrix(1, 9, 9), fun = mean, na.rm =TRUE) |>plot()
focal( points_berlin_density, w =matrix(1, 9, 9), fun = mean,na.rm =TRUE) |>plot()
What is this argument w?
It’s magic. Just kidding, but it is indeed really powerful. And it makes weird stuff, particularly with our point based densities. It builds around the idea of connecting the value of a focal grid cell to the values of surrounding grid cells, as in the figure below. Hence, the name of the function terra::focal() where the argument is used
It’s just a simple base R matrix
When using this matrix as input and applying the statistic fun = mean, we simply change the value of the focal grid cell to the mean of the values of itself and the 8 surrounding grid cells. That’s what we did before in one example.
terra::focal( foreigners_berlin, w =matrix(1, 5, 5), fun = mean, na.rm =TRUE) |>plot()
focal( foreigners_berlin, w = weighted_w, fun = mean,na.rm =TRUE) |>plot()
Real life example: Edges of immigrant rates
In ethnic diversity research, it is a relevant question as to whether sudden changes in the neighborhood composition may rise the potential of conflicts between groups. Researchers use edge detection algorithms from image processing to investigate such changes spatially.
What is edge detection?
Edge detection identifies sudden changes in colors in an image to ‘draw’ borders of things that are in a picture. Here’s an example of the prominent Sobel filter.